Data

The original data is stored in the file lxb-wz2c-m46p (1).xlsx. After excluding the first 100 rows, a random sample of 100 entries was selected from the remaining data with seed 2025. Based on the human brain and ground truth, a summary of text comment of original data was extracted, recorded in total 13 features Document.ID, Tone, Tone_justify, Tone_quote, Commenter_Role, Specialty, Patient_cost, Patient_quality, Provider_pay, Provider_quality, Other, Justification, Quote, Role_category into a CSV file named fill_sample.csv. it contains 13 discrete variables and 100 rows, totaling 1,300 cells. Among them, 18.1% of the cells are empty,meaning no useful information could be extracted.

Missing value

Tone and Tone_justify Description & Exploration

Commenter Roles Explanation and Exploration

Commenter Role with Specialties
Commenter_Role Specialties
anesthesiologist -
attorney -
caregiver -
cilvil_servant -
nurse register, -
organization ngo, coalition, labor_union
patient -, millitary veteran, kidney transplant recipient
pharmacist -, retired
physician emergency, rheumatology, -, terminated
provider -, emergency
psychotherapist emergency, -, terminated

[1] “physician” “nurse” “provider” “pharmacist”
[5] “psychotherapist”

Examples of Comments with Hard-to-Identify Commenter Roles

These comments lack clear identifying information, making it difficult to classify the commenter roles. So, most of them are labeled as NA in Commenter_role


Exploration of Correlation Between Commenter Role and Tone

Association Between Commenter Role and Tone

  • Chi-square test indicates that Commenter_Role is statistically significantly associated with the Tone of the comment (\(p<0.05\)).

  • The Contingency Table Heatmap shows counts for each combination of Role ("patient", "provider", "other", "no information") and Tone category ("negative", "very negative", "neutral", "positive").

Distribution of Tone per Role Category

The histogram showing the distribution of tone within each role category reveals the following:

  • For commenters with no information on Role, contributions are relatively highest in the negative and very negative tones. Combining this with the longitudinal interpretation—that those directly experiencing PE tend to show negative or very negative tones—this suggests that many commenters without role information may actually belong to the patient or provider groups if forced to classify within patient, provider, or other.

  • For providers and patients, both groups are contributed more by negative or very negative tones compared to neutral or positive tones. Moreover, providers are contributed by more to the negative tone than the very negative tone compared to patients. This may be due to providers’ professional training and occupational discipline, which help them better regulate emotional expression when describing issues.

  • For Other, contribution from positive tone is the least, while contributions from negative, very negative, and neutral tones show no marked imbalance. This suggests that people who do not directly experience PE tend not to hold supportive attitudes toward PE, indicating that PE’s impact is broad and its societal effect is primarily negative.

Exploration of comment’s length

word_count features the comment text length. For the comments recorded in the attached files, I count the words manually. For the comments recorded in the original data cell, I count the words through function str_count() in R.

Overall Summary Word Count
count missing min q1 median mean q3 max sd
100 0 1 46.25 125.5 333.41 226.25 6297 806.8054


Summary Word Count by Tone
Tone count min q1 median mean q3 max sd
negative 44 1 77.25 142 387.7727 271.75 6297 1002.44962
neutral 8 3 3.00 12 37.8750 21.25 227 76.89592
positve 1 3 3.00 3 3.0000 3.00 3 NA
very_negative 47 3 68.00 130 339.8511 173.50 3724 663.49463
Kruskal-Wallis Test for Word Count by Tone
Statistic Df P_value
13.693 3 0.003354
Dunn's Test for Word Count by Tone
Comparison Z P.unadj P.adj
negative - neutral 3.3397435 0.000838558 0.005031
negative - positve 1.7401757 0.081828165 0.490969
neutral - positve 0.4489645 0.653457281 1.000000
negative - very_negative 0.7190723 0.472096352 1.000000
neutral - very_negative -2.9618691 0.003057778 0.018347
positve - very_negative -1.5921502 0.111350954 0.668106

Summary Word Count by Role_category
Role_category count min q1 median mean q3 max sd
other 12 1 3.0 62.0 257.6667 238.00 1741 497.5126
patient 20 3 138.5 178.5 258.7000 264.25 1199 262.9307
provider 29 3 84.0 126.0 307.7931 294.00 2720 516.0690
NA 39 3 31.5 78.0 414.0769 132.50 6297 1175.3598
Kruskal-Wallis Test for Word Count by Role_category
Statistic Df P_value
3.658 2 0.1606
Dunn's Test for Word Count by Role_category
Comparison Z P.unadj P.adj
other - patient -1.9095797 0.05618735 0.1686
other - provider -1.1887238 0.23454838 0.7036
patient - provider 0.9951801 0.31964867 0.9589

Summary of Word Count by Concern Level
Concern_Level count min q1 median mean q3 max sd
No Concern 9 3 5.00 13.0 21.11111 25.00 61 20.43554
Low 19 19 57.50 118.0 382.05263 153.50 3724 894.03781
Medium 40 3 72.75 131.0 402.80000 259.50 6297 1053.41528
High 22 66 121.00 155.5 364.81818 358.50 1805 476.34298
Highest 10 1 3.00 53.5 175.40000 196.75 756 262.97917
Kruskal-Wallis Test for Word Count by Concern_Level
Statistic Df P_value
21.393 4 0.0002646
Dunn's Test for Word Count by Concern_Level
Comparison Z P.unadj P.adj
High - Highest 2.4292043 0.0151320013 0.1513200
High - Low 1.6348864 0.1020728561 1.0000000
Highest - Low -1.0608067 0.2887777588 1.0000000
High - Medium 1.2417371 0.2143335735 1.0000000
Highest - Medium -1.6881877 0.0913752034 0.9137520
Low - Medium -0.6547435 0.5126328864 1.0000000
High - No Concern 4.2979731 0.0000172367 0.0001724
Highest - No Concern 1.6849391 0.0920003204 0.9200032
Low - No Concern 2.9373759 0.0033100264 0.0331003
Medium - No Concern 3.7162380 0.0002022111 0.0020221
write.csv(df,"100_entries_sample.csv")